# Multimodal Control
Spatialvla 4b 224 Sft Fractal
MIT
SpatialVLA is a vision-language-action model fine-tuned on the fractal dataset, primarily used for robot control tasks.
Text-to-Image
Transformers English

S
IPEC-COMMUNITY
375
0
Openvla 7b Oft Finetuned Libero Object
MIT
OpenVLA-OFT is an optimized vision-language-action model that significantly improves speed and success rate through fine-tuning techniques.
Multimodal Fusion
Transformers

O
moojink
403
1
Stable Diffusion 3.5 Large Controlnet Blur
Other
Blur control network based on the Stable Diffusion 3.5 large model, used for generating content controlled by blurry images
Image Generation English
S
stabilityai
603
11
Hico T2I
Apache-2.0
HiCo is a hierarchical controllable diffusion model specifically designed for layout-to-image generation tasks.
Image Generation English
H
qihoo360
29
2
CSGO
Apache-2.0
CSGO is a PyTorch implementation for text-to-image generation, supporting image-driven style transfer, text-driven stylized synthesis, and text-editing-driven stylized synthesis.
Image Generation English
C
InstantX
500
34
Featured Recommended AI Models